-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call Status.Update once in each reconcile attempt #494
Call Status.Update once in each reconcile attempt #494
Conversation
✅ Deploy Preview for kubernetes-sigs-jobset canceled.
|
There are two TODOs in this PR. Do you want to address them in the PR? or as part of the review? |
21583a3
to
bde9e1b
Compare
Let's discuss the TODO related to the event emission in this PR. I removed the other TODO (related to updating child Job statuses), that was just a note for myself while investigating the integration test "resource has been modified, try again" errors, which I'm no longer attempting to address in this PR. I think you're right that if these errors also appear in Kueue integration test logs then it's likely expected, and instead of going down that rabbit hole we can look into ways to filter those log lines from the integration test logs if we want to clean that up. |
7bf9e1f
to
a8d9a0b
Compare
a71c4dc
to
2bd957f
Compare
Need to refactor unit tests for this, will comment when it's ready for another look. |
a5af01e
to
e6940bf
Compare
e6940bf
to
d32e94d
Compare
/hold I'll continue work on this after the v0.5.0 release |
d32e94d
to
394dcd5
Compare
394dcd5
to
d06f8f0
Compare
e346ab3
to
6289a88
Compare
/label tide/merge-method-squash Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, danielvegamyhre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #405
This approach removes the JobSet status update API calls from the controller, and simply modifies the status in memory, and does a single status update call at the end of each reconciliation attempt.
These changes also include:
forceFalseUpdate
flag fromconditionOpts
since we are no longer using a condition status of false/true on the in order startup policy condition to indicate if the startup policy is in-progress/completed (see prior bullet point).Note: the operation conflict errors are still in the integration tests. One idea I had is that we are still doing Job status updates in the middle of reconcile logic. These will trigger Jobset reconciles which then update the jobSet.status.replicatedJobStatuses as it looks at those child jobs. The controller runtime work queue should be handling these sequentially though since MaxConcurrency defaults to 1 and we don't override that, so I'm not sure of the root cause yet. Kueue has these errors in the integration test logs as well though so they may be an expected side effect of the integration test framework or someting.